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Abstract 

The estimation of multivariate GARCH time series models is a difficult task mainly due to the 
' ' significant overparameterization exhibited by the problem and usually referred to as the "curse of 

00 dimensionality". For example, in the case of the VEC family, the number of parameters involved in 
the model grows as a polynomial of order four on the dimensionality of the problem. Moreover, these 
parameters are subjected to convoluted nonlinear constraints necessary to ensure, for instance, the 
existence of stationary solutions and the positive semidefinite character of the conditional covariance 

r ■\ matrices used in the model design. So far, this problem has been addressed in the literature only in 

low dimensional cases with strong parsimony constraints (see for instance [ASPL03] for the diagonal 
^ three-dimensional VEC handled with ad-hoc techniques). In this paper we propose a general for- 

' mulation of the estimation problem in any dimension and develop a Bregman-proximal trust-region 

' . method for its solution. The Bregman-proximal approach allows us to handle the constraints in 

1 ^ I a very efficient and natural way by staying in the primal space and the Trust-Region mechanism 

stabilizes and speeds up the scheme. Preliminary computational experiments are presented and 
T-H confirm the very good performances of the proposed approach. 

> 

in 

t" — Key Words: multivariate GARCH, VEC model, volatility modeling, multivariate financial time 

series, Bregman divergences. Burg's divergence, LogDet divergence, constrained optimization. 

O 1 Introduction 

Autoregressive conditionally heteroscedastic (ARCH) models |Eng82| and their generalized counterparts 
J> (GARCH) }Bol86j are standard econometric tools to capture the leptokurticity and the volatility cluster- 

• '~j ing exhibited by financial time series. In the one dimensional situation, a large collection of parametric 

rS models that account for various stylized features of financial returns is available. Additionally, adequate 

model selection and estimation tools have been developed, as well as explicit characterizations of the 
conditions that ensure stationarity or the existence of higher moments. 

One of the advantages of GARCH models that makes them particularly useful is that once they have 
been calibrated they provide an estimate of the dynamical behavior of volatility which, in principle, is 
not directly observable. This feature makes desirable the extension of the GARCH prescription to the 
multivariate case since such a generalization provides a dynamical picture of the correlations between 
different assets which are of major importance, in the context of financial econometrics, for pricing and 
hedging purposes, asset allocation, and risk management in general. 
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This generalization is nevertheless not free from difficulties. The most general multivariate GARCH 
models are the VEC prescription proposed by BoUerslev et al |BEW88j and the BEKK model by Engle 
et al |EK95j ; both families of models present satisfactory properties that match those found in univariate 
GARCH models, nevertheless their lack of parsimony, even in low dimensions makes them extremely 
difficult to calibrate; for example, VEC(1,1) models require n(n+l)(n(n+l) + l)/2 parameters, where n is 
the dimensionality of the modeling problem; BEKK(1,1,1) requires n(5n+l)/2. Indeed, due to the high 
number of parameters needed, it is rare to find these models at work beyond two or three dimensions and 
even then, ad hoc estimation techniques are used and additional limitations are imposed on the model to 
make it artificially parsimonious; see for example [ASPL03] for an illustration of the estimation of a three 
dimensional DVEC model (VEC model with diagonal parameter matrices |BEW88j ) using constrained 
non-linear programming. These difficulties have lead to the search for more parsimonious but still 
functioning models hke for example CCC jBol90| . DCC [TT021 [E5g02] or GDC |KN98| . On a different 
vein, a number of different signal separation techniques have been tried out in the financial time series 
context with the aim of reducing this intrinsically multivariate problem to a collection of univariate ones. 
For example, principal component analysis is used in the 0-GARCH model |Din941 IAC971 1 Ale98l IAle03) 
and independent component analysis in the ICA-GARCH model |WYL06[ rCFCPPOS) . We advice the 
reader to check with the excellent reviews |BLR06[ IST09| for a comprehensive description of these and 
other models. 

Despite the overparameterization problem we will concentrate in this work on full fledge VEC models. 
This decision is taken not for the pure sake of generality but because the intrinsic difficulties of this 
parametric family of models make them an ideal benchmark for testing optimization techniques subjected 
to potentially complex matrix constraints. Stated differently, it is our belief that, independently from 
the pertinence of the VEC family in certain modeling situations, any optimization algorithm developed 
to estimate them will work smoothly when applied to more elementary situations. Hence, the work that 
we present in this paper is capable of increasing the range of dimensions in which VEC models can be 
estimated in practice by improving the existing technology in two directions: 

• Explicit matrix formulation of the model and of the associated stationarity and positivity con- 
straints: the works in the literature usually proceed by expressing the constraints in terms of the 
entries of the parameter matrices (see for example |ASPL03] in the DVEC case) . A global matrix 
formulation is necessary in order to obtain a dimension independent encoding of the problem. 
This task is carried out in Sections [2] and [3] 

• Use of a Bregman-type proximal optimization algorithm that efficiently handles the constraints in 
the primal space. More specifically, we will be using Burg's matrix divergence; this divergence 
is presented, for example, in |KSD09a] and it is a particular instance of a Bregman divergence. 



Bregman divergences are of much use in the context of machine learning (see for instance _ DT07[ 

IKSD09b] and references therein). In our situation we have opted for this technique as it allows 
for a particularly efficient treatment of positive definiteness constraints, as those in our problem, 
avoiding the need to solve additional secondary optimization problems that appear, for example, 
had we used Lagrange duality. It is worth emphasizing that even though the constraints that we 
handle in the estimation problem admit a simple and explicit conic formulation well adapted to the 
use of Lagrange multipliers, the associated dual optimization problem is in this case of difficulty 
comparable to that of the primal so avoiding this extra step is a major advantage. This approach 
is presented in Sections |4.1| and |4.2| In Section |4.3| we couple the use of Bregman divergences 
with a refinement of the local penalized model using quadratic BFGS type terms and with a trust- 
region iteration acceptance rule that greatly stabilizes the primal trajectory and improves the 
convergence speed of the algorithm. Finally, given the non-linear non-convex nature of estimation 
via quasi-loglikelihood optimization, the availability of good preliminary estimation techniques is 
of paramount importance in order to avoid local minima; this point is treated in Section [4 . 4| where 
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some of the simpler modeling solutions listed above are used to come up with a starting point to 
properly initialize the optimization algorithm. 

In Section [5] we illustrate the estimation method proposed in Section [4] with various numerical 
experiments that prove its applicability and support the following statements: 

• The trust-region correction speeds up the algorithm and the BFGS modification makes the con- 
vergence rate dimensionally independent. 

• More importantly, VEC seems to be a performing modeling tool for stock market log-returns 
when compared with other more parsimonious parametric families, even in dimensions where the 
high number of parameters in comparison with the sample size would make us expect a deficient 
modeling behavior. Our conjecture is that this better than expected results have to do with the 
spectral sparsity (in the dimensions we work on we should rather say spectral concentration) of 
the correlation matrices exhibited by stock market log-returns; this empirically observed feature 
imposes nonlinear constraints on the model parameters that invalidate the hypotheses necessary 
to formulate the standard results on the asymptotic normality of the quasi- loglikelihood parameter 



estimator (see later on expressions (4.1 1 and (4.2 1) and make it more favorable with respect to its 



use with standard sample sizes. In a forthcoming publication we plan to provide a detailed study 
of the convergence and complexity properties of the proposed algorithm, together with dimension 
reduction techniques based on the use of the spectral sparsity that, as we said, is empirically 
observed in actual financial time series. 



Notation and conventions: In order to make the reading of the paper easier, most of the proofs of the 
stated results have been gathered at the end in the form of appendices. All along the paper, bold symbols 
like r denote column vectors, denotes the transposed vector. Given a filtered probability space 
{ft,P,J^, {Tt}t£f>i) and X,Y two random variables, we will denote by Et[X] := E[X\Tt] the conditional 
expectation, covt{X,Y) cov{X, Y\Tt) := E'^^^] ~ -E't[-''^]-E't[^] the conditional covariance, and by 
va.Vt{X) := EtlX"^] — Et[X]'^ the conditional variance. A discrete-time stochastic process {XtjteN is 
predictable when Xt is J't-i-measurable, for any t e N. 



2 Preliminaries on matrices and matrix operators 

Matrices: Let n,m G N and denote by M„_m the space of n x to matrices. When n — m we will just 
write M„ to refer to the space of n x n square matrices. Unless specified otherwise, all the matrices in this 
paper will contain purely real entries. The equality A — (aij) denotes the matrix A with components 
Qij e M. The symbol §„ denotes the subspace of M„ that contains all symmetric matrices 

S„ ^{AeM^lA^^A} 

and (respectively is the cone in S„ containing the positive (respectively negative) semidefinite 
matrices. The symbol A ^ (respectively A ^ 0) means that A is positive (respectively negative) 
semidefinite. 

We will consider M„_m as an inner product space with the pairing 

(A, S) = trace(AB^) (2.1) 
and denote by || A|| = {A, A) 5 the associated Frobenius norm. Given a linear operator A : M„ .,„ Mp^g 



we will denote by A* : M* ^ M* ^ its adjoint with respect to the inner product (|2.1 1 
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The vec, vech, mat, and math operators and their adjoints: The symbol vec : M„ M" 
denotes the operator that stacks ah the columns of a matrix into a vector. Let = ^n{n + 1) and let 
vech : S„ be the operator that stacks only the lower triangular part, including the diagonal, of a 

symmetric matrix into a vector. The inverse of the vech (respectively vec)operator will be denoted by 
math : §„ (respectively mat : M"' -)> M„). 

Given n e N and N = ^n{n + 1), let S = e {1, . . . , n} x {1, . . . , n} | z > j} we define 

(T : 5 — > {1, . . . , N} as the map that yields the position of component > j, of any symmetric 

matrix in its equivalent vech representation. The symbol : {1, . . . , A'^} — > S will denote its inverse 
and a : {1, . . . , n} X {1, . . . , n} — > {1, . . . , TV} the extension of a defined by: 

The proof of the following result is provided in the Appendix. 

Proposition 2.1 Given n e N and N = |n(n + 1), let A e S„ and m e arbitrary. The following 

identities hold true: 

(i) (vcch(A),TO) = + diag(A),math(m)). 

(ii) (A, math(m)) = 2(vcch(v4 - idiag(^)), to), 

where diag(A) denotes the diagonal matrix obtained out of the diagonal entries of A. Let vech* : — >■ 
§„ and math* : §„ be the adjoint maps of vech and math, respectively, then: 



math*(A) = 2 vech - -diag(A) j , (2.3) 

vech*(m) = ^ (math(m) + diag(math(m))) . (2.4) 

The operator norms of the mappings that we just introduced are given by: 

||vech||op = 1 (2.5) 

||math|jop = (2.6) 

||vech*|Up = 1 (2.7) 

||math*||op = 72 (2.8) 

lldiaglUp = 1 (2.9) 



Block matrices and the S operator: let n S N and B s M„2 . The matrix B can be divided into 
blocks Bij G M„ and hence its components can be labeled using a blockwise notation by referring to the 
(A;, I) element of the {i,j) block as {Bij)ki- This notation makes particularly accessible the interpretation 
of B as the coordinate expression of a linear endomorphism of the tensor product space R" ig)M". Indeed 
if {ei, . . . , e„} is the canonical basis or R", we have 

n 

B{ei ® efe) = ^ {Bij)ki{ej e;)- (2.10) 
Definition 2.2 Let A e M^v with N = ^n(n+l). We define T,{A) S §„2 blockwise using the expression 

{^A„^k,l),cr{i,j), if i>j 



[ Ifk<l J:{A)ki =^{A)ik, 
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where a is the map defined above that yields the position of component [i,j),i > j, of any symmetric 
matrix in its equivalent vech representation. By construction {'E{A)ki)ij is symmetric with respect to 
transpositions in the {k,l) and {i,j) indices; this implies that is both symmetric and hlockwise 

symmetric. We will refer to any matrix in §„2 with this property as n-symmetric and will denote the 
corresponding space by S^2 ■ 

The proofs of the next two resuhs are provided in the Appendix. 

Proposition 2.3 Given H G E>n andAGM^, with N — ^n{n+l). then-symmetric matrixTi{A) €§"2 
that we just defined satisfies: 

Avech(i?) = vech(E(A) • iJ), (2.12) 

where S(^) is the symmetric matrix given by 

{^{A) . H)ki - {nA)kuH) - trace(S(A)fe,i/). 

Proposition 2.4 Let S : Mat — > M„2 be the operator defined in the previous proposition, N = |n(n+l). 
Then, for any B £ M^^ , the corresponding dual map E* : M„2 — >• Mat is given by 

Y.*{B)^2B-B, (2.13) 

where B , B £ M.^ are the matrices defined by 

Bpq = ((]P"2(^))cr-i(p))(T-i(g), and Bpq = -Bp5(5pi.i('^-i(p)),pi2('^"Mp))- 
The symbol P"2 (B) denotes the orthogonal projection of B € M„2 onto the space SJ^a of n-symmetric 



matrices that we spell out in Lemma 7.1 As we saw in Proposition 2.3 E maps into the space SJJa of 
symmetric matrices; fe< E : M^r — >■ §^2 be the map obtained out of Yi by restriction of its range. The 
map Y} is a bijection with inverse Y}^^ : §^2 — > M^r given by 

(Y-\B)j^^ = (B<,-i(p))^_i^^^ (2-(5p,^(^-i(q))^pr^(^-i(,))) . (2.14) 

3 The VEC-GARCH model 

Consider the n-dimensional conditionally heteroscedastic discrete-time process {zt} determined by the 
relation 

Zt - Hl^^et with {et} ^ IIDN(0, /„). 

In this expression, {Ht} denotes a predictable matrix process, that is for each t £ N, the matrix random 
variable Ht is J^t_i-measurable, and hI^^ is a square root of Ht, hence it satisfies H]^'^{H]^'^)^ = Ht. 
In these conditions it is easy to show that the conditional mean -Et[zt] = and that the conditional 
covariance matrix process of {zf} coincides with {Ht}. 

Different prescriptions for the time evolution of the conditional covariance matrix {Ht} determine 
different vector conditional heteroscedastic models. In this paper we will focus on the VEC-GARCH 
model (just VEC in what follows). This model was introduced in |BEW88j as the direct generalization 
of the univariate GARCH model |Bol86' in the sense that every conditional variance and covariance is 
a function of all lagged conditional variances and covariances as well as all squares and cross-products 
of the lagged time series values. More specifically, the VEC(q,p) model is determined by 

q p 

ht=c-\-Y^ Ai-qt^, -t- ^ B,ht-i, 

i=l i=l 
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where ht := vech{Ht), rjj := vech(ztz-^), c is a TV-dimensional vector, with N := n{n + l)/2 and 
A„B,€Mn. 

In the rest of the paper we wiU restrict to the case p = q = \, that is: 

Zt = Hl'^et with {et}^IIDN(0,/„), .3^. 
ht = c + A'nt^i+ Bht-i. 

In this case the model needs iV(2A^ + 1) — \{'n? +n){n'^ + n + l) parameters for a complete specification. 
3.1 Positivity and stationarity constraints 



The general prescription for the VEC model spelled out in (3.1) does not guarantee that it has station- 
ary solutions. Moreover, as we saw above, the resulting matrices {Ht}ti£N are the conditional covariance 
matrices of the resulting process and therefore, additional constraints should be imposed on the param- 
eter matrices c. A, and B in order to ensure that they are symmetric and positive semidefinite. Unlike 
the situation encountered in the one-dimensional case, necessary and sufficient conditions for positivity 
and stationarity seem very difficult to find and we will content ourselves with sufficient specifications. 

Positivity constraints: we will use the sufficient conditions introduced by Gourieroux in [Gou97j 
that, as we show in the next proposition, can be explicitly formulated using the map E introduced in 
Definition [m 



Proposition 3.1 If the parameter matrices c, A, and B in [3.1) are such that va&t\i{c) ^Ti{A) , andTi{B) 
are positive semidefinite then so are the resulting conditional covariance matrices {Ht}t<£N, provided the 
initial condition Hq is positive semidefinite. 

Second order stationarity constraints: Gourieroux |Gou97| has stated sufficient conditions in terms 
of the spectral radius of A + B that we will make more restrictive in order to ensure the availability of 
a formulation in terms of positive semidefiniteness constraints. 



Proposition 3.2 The VEC model specified in (3.1) admits a unique second order stationary solution 



if all the eigenvalues of A + B lie strictly inside the unit circle. This is always the case whenever the 
top singular eigenvalue crinax(^ + ^) of A -\- B is smaller than one or, equivalently, when the matrix 
In — [A + B){A -\- B)^ is positive definite. If any of these conditions is satisfied, the marginal variance 
of the model is given by 

r(0) = math(£;[ht]) math((lAr -A- B)-^c). (3.2) 

3.2 The likelihood function, its gradient, and computability constraints 

Given a sample z = {zi, . . . ,zt}, the quasi-loglikelihood associated to ( |3.1[ ) is: 

T T 

TN 1 1 

logL(z; 6) = - _ log 2^ - - ^ log(det Ht) - ^jH^'^t (3.3) 

t=i t=i 

where 6 :— (c. A, B). In this expression, the matrices Ht are constructed out of 9 and the sample z using 



the second expression in (3.1 1. This implies that the dependence of logL on 6 takes place through the 
matrices Ht. Notice that these matrices are well defined once initial values Hq and Zq have been fixed. 
This initial values are usually taken out of a presample; if this is not available it is customary to take 
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the mean values associated to the stationary model, namely z = and Hq = math((lAr — A — B) ^c) 
(see (3.2)). Once the initial conditions have been fixed, it can be shown by induction that 




B'ho. 



(3.4) 



i=0 



The maximum likelihood estimator of is the value that maximizes (3.3) for a given sample z. The 
search of that extremal is carried out using an optimization algorithm that we will discuss later on in 
the paper and that requires the gradient VelogL(z; 9) of logL. In order to compute it we write the total 
quasi-loglikelihood as a sum of T conditional loglikelihoods 



lt{zt;A,B,c) 
A lengthy calculation shows that: 



N 



1 



1 



log27r - - log(deti/t) - -zj H^\t 



yjt = 


{.It - 


r.)"E^^ ' 

i=0 




^Ah = 


"i-l 

Y,rit-^-Alt-^tf B^ 


T 


VbU = 


t-1 

E 

j=0 


i-l 

Ei3^(c + AT,,_,_ 


,) (7t - r*)^ B'-^-^ + i?^ho (7t - Ttf 



(3.5) 
(3.6) 
(3.7) 



where 



"^-math*(iJt ^), 7i ^math*(A4), 



and A* 



Hr^ZfziH. 



tzt ■ 



These formulas for the gradient were obtained by using the explicit expression of the conditional covari- 
ance matrices (3.4) in terms of the sample elements and the coefficient matrices. Such a closed form 
expression is not always available as soon as the model becomes slightly more complicated; for example, 
if one adds to the model (3.1) a drift term like in |Dua95j for the one dimensional GARCH case, an 
expression like (3.4) ceases to exist. That is why, in the next proposition, we introduce an alternative 
iterative method that can be extended to more general models, it is well adapted to its use under the 
form of a computer code and, more importantly, suggests the introduction of an additional estimation 
constraint that noticeably shortens the computation time needed for its numerical evaluation. 



Proposition 3.3 Let z = {zi, 

loglikelihood introduced in (pTT 



..,zt} be a sample, 6 := {c,A,B), and let logL(z; 0) be the quasi- 
Then, for any component 9 of the three-tuple 6, we have 



Vg log i = E = E ^e-ff* • "^Hjt, whe 



t=l 



1 



and the differential operators TgHt are determined by the recursions: 

T*Ht ■ A = math* (A) + T*Ht-i ■ vech*(B^math*(A)), 

TXHt ■ A = math* (A) • t]J_^ + T^Ht-i ■ vech* (B^math* (A)), 

T^Ht ■ A = math* (A) • vech(iJf_i)'^ + T^^t-i • vech*(B^math*(A)), 



(3.8) 
(3.9) 

(3.10) 
(3.11) 
(3.12) 
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with Tj^ = vec h(ztz'^ ), A g §„ and setting T*Hq = 0, Tj^Hq = TgHo = 0. The operators TgHt 
constructed in (3.10)-(3.12) are the adjoints of the partial tangent maps T^Ht : §„, T^Ht '■ 



Mjv -> Sn, and TsHt : Mm — >■ Sn Ht{c,A,B) :— math{ht{c,A,B)), with ht{c,A,B) as defined 



in (3.4-) 



Matrix expression of the recursions (3.10 ) ( 3.12| ): the use of Proposition 3.3 requires translating 
the operator recursions ( 3.10 1-( 3.12) into matrix recursions. In this particular case this can be achieved 
by writing A S S„ as A = vech*(v), with v = math* (A) G M^. With this change of variables, the 



expression (3.10) becomes 



T*Ht ■ vech*(v) = V + T;Ht-i ■ vcch*(B^v) 

Let Ct G Mjv be the matrix associated to the linear operator T^iJtovech* : 
the matrices {ct}tg{i,...,T} sre determined by the recursions 



(3.13) 



In view of (3.131 



Ct 



Ct-iB' 



(3.14) 



Once the family {ct}tg{i has been computed, it can be used in (3.8) by noticing that 

T*Ht -A^Cf math* (A). 
7V2 jv be the matrix associated to the linear operator vec o T^Ht o vech* 



Regarding (3.11 ), let A e 



^ . Given that 

vec(vJ7f_i) = vec(Ijvv?7f_i) = {r]^_^ » 1^) vec(v) = {r]^_^ 1^) v, 



(3.15) 



the recursion (3.11) implies that the family {^t}te{i....,T} is determined by 

At = {r]t_i(E)lN) + At-iB^ 

and hence, once it has been computed, it can be used in ( |3.8[ ) by noticing that 

TXHt • A = mat {At ■ math* (A)) , 

where we recall that mat denotes the inverse of the vec operator. Finally, let Bt G M[jv2,Ar be the 
matrix associated to the linear operator vec o TgHt ° vech* : . The family {St}tg{i,....T} is 

determined by 

Bt^{vcch{Ht-i)^lN) + Bt-iB^, and hence, T^i/f • A = mat (Bt • math* (A)) . (3.16) 

The computability constraints. In the particular case of the VEC(1,1) model, the existence of the 
matrix recursions ( |3.14[ )-(3.16) associated to ( 3.10 )-( 3.12) makes the computation of (3.8) relatively 
easy. For more general models, the matrix recursions are not easily available and one is forced to work 



directly with (the analog of) (3.10)-(3.12|. In those cases, if we deal with a long time series sample, the 



computation of the gradient (3.8 ) may turn out numerically very expensive since it consists of the sum of 



T terms T'gHfV hM, each of which is made of the sum of the t terms recursively defined in (3.10 )-(3.12 ) 



A major simplification can be obtained if we restrict ourselves in the estimation process to matrices B 
whose top eigenvalue is in norm smaller than one. The defining expressions for the differential operators 
TgHt show that in that situation, only a certain number of iterations, potentially small, is needed to 



compute the gradients with a prescribed precision. This is particularly visible in the expressions (3.5 1 



(3.7 1 where the dependence on the powers of B makes very small many of the involved summands 
whenever the spectrum of B is strictly contained in the unit disk. This is the reason why we will 
impose this as an additional estimation constraint. The details of this statement are spelled out in the 
proposition below that we present after the summary of the constraints that we will impose all along 



the paper on the model (3.1): 
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(SC) Stationarity constraints: 1^(1 - eab) - {A + B){A + B)'^ > for some small cab > 0. 



(PC) Positivity constraints: math(c) — ecln 'ti 0, S(A) — e^I„2 ^ 0, and T,{B) — eBl„2 ^ 0, for some 
small CA, es, Ec > 0. 

(CC) Computability constraints: 1^(1 — ?b) — BB^ ^ for some small cb > 0. 

Proposition 3.4 Let t (z N b e a fixe d lag and let TgHt he the differential operators defined by applying 
t times the recursions (3.10)-(3.12). Consider now the operators TgHj^ obtained by truncating the 
recursions { 3.10)-(3.12) after k iterations, k <t. If we assume that the coefficients c,A, and B satisfy 



the constraints (SC), (PC), and (CC) then the error committed in the truncations can be estimated 
using the following inequalities satisfied by the operator norms: 



t Hop 



< 



2(1-?^)'= 



\E[TXHt-TXHl]\U, < 



\E[nHt-nH^] Hop 



< 



2(1-?b)^||c| 

CAB 

2(1~6b)^I|c| 

£AB 



(3.17) 
(3.18) 
(3.19) 



Notice that the last two inequalities estimate the error committed in mean. As consequence of these 



relations, if we allow a maximum expected error S in the computation of the gradient {3.8) then a lower 
bound for the number k of iterations that need to be carried out in {3.1(J^ - l3.12 ) is: 



log 



2 



log 



log(l -es)' log(l-eB) 



(3.20) 



Remark 3.5 The estimate (3.20) for the minimum number of iterations needed to reach a certain 



precision in the computation of the gradient is by no means sharp. Numerical experiments show that 
the figure produced by this formula is in general too conservative. Nevertheless, this expression is still 
very valuable for it explicitly shows the pertinence of the computability constraint (CC). 

Remark 3.6 Wc emphasize that the constraints (SC), (PC), and (CC) are sufficient conditions for 
stationarity, positivity, and computability, respectively, but by no means necessary. For example (SC) 
and (CC) could be replaced by the more economical (but also more restrictive) condition that imposes 
A,B £ §^ with Ainax(^ + B) < {1 — Cab)- In this situation it can be easily shown that A„iax(-B) < 1 
and hence the computability constrained is automatically satisfied. 



4 Calibration via Bregman matrix divergences 



In this section we present an efficient optimization method that, given a sample z, provides the parameter 
value 8 corresponding to the VEC(1,1) model that fits it best by maximizing the quasi-loglikelihood ( 3.3 ) 
subjected to the constraints (SC), (PC), and (CC). It can be proved under certain regularity hypothe- 
ses (see |Gou97[ page 119]) that the quasi-loglikelihood estimator 6 is consistent and asymptotically 
normal: 



Vf{9 - 6»o) — ^ A^(0, Qo) where Qq = A^^BqA^' 



with 



dist 



Aq = Eg 



dOde^ 



and Bq = Eqo 



06 dO^ 



(4.1) 



(4.2) 
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These matrices are usually consistently estimated by replacing the expectations by their empirical means 
and the true value of the parameter 9q by the estimator 6: 

4.1 Constrained optimization via Bregman divergences 

The optimization method that we will be carrying out to maximize the quasi-loglikelihood is based on 
the use of Burg's matrix divergence. This divergence is presented, for example, in jKSD09a] and it 
is a particular instance of a Bregman divergence. Bregman divergences are of much use in the context 
of machine learning (see for instance [DT07[ IKSDOQb] and references therein) . 

In our situation we have opted for this technique as it allows for a particularly efficient treatment of 
the constraints in our problem, avoiding the need to solve additional secondary optimization problems. 
In order to make this more explicit it is worth mentioning that we also considered different approaches 
consisting of optimizing quadratically penalized local first or second order models with the positive 
semidefinite constraints (PS), (SC), and (CC); since we were not able to find a closed form expression 
for the optimization step induced by this constrained local model, we were forced to use Lagrange 
duality. Even though the constraints admit a simple conic formulation that allowed us to explicitly 
formulate the problem, this approach finally resulted in a problem that is much more computationally 
demanding than just incorporating the constraints into the primal scheme using Bregman divergences, 
as we propose below. 

Definition 4.1 Let X,Y (z ond : S„ — > R a strictly convex differentiable function. The Breg- 
man matrix divergence associated to (f> is defined by 

D^{X,Y) := ^{X) - cl){Y) - trace {W<l>{Yf{X - Y)) . 

Bregman divergences are used to measure distance between matrices. Indeed, if we take the squared 
Frobenius norm as the function 0, that is (j){X) := then D^{X, Y) := ||X - Other example 

is the von Neumann divergence which is the Bregman divergence associated to the entropy of 
the eigenvalues of a positive definite matrix; more explicitly, if X is a positive definite matrix with 
eigenvalues {Ai, . . . , A„}, then (f){X) := ^^^li^i^ogXi — Xi). In our optimization problem we will be 
using Burg's matrix divergence (also called the LogDet divergence or Stein's loss in the statistics 
literature |JS61) ) which is the Bregman divergence obtained out of the Burg entropy of the eigenvalues 
of a positive definite matrix, that is 4>{X) := — log Ai, or equivalently 0(X) := — logdet(X). The 
resulting Bregman divergence over positive definite matrices is 

DBiX,Y) tracc(Xy"i) - \ogdet{XY-^) - n. (4.3) 

The three divergences that we just introduced are examples of spectral divergences, that is, the function 
(j) that defines them can be written down as the composition = o A, where f : M" — > M is 
differentiable strictly convex function and A : S„ — > M" is the function that lists the eigenvalues of X 
in algebraically decreasing order. It can be seen (see Appendix A in [KSDOQa] ') that spectral Bregman 
matrix divergences are invariant by orthogonal conjugations, that is, for any orthogonal matrix Q e 0„: 

D^iQ^XQ, Q^YQ) = D^iX, Y). 

Burg divergences are invariant by an even larger group since 



Db{M'^XM,M'^YM) = Db{X,Y), 
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for any square non-singular matrix M. Additionally, for any non-zero scalar a: 

DB{aX,aY)^DB{X,Y). 

The use of Bregman divergences in matrix constrained optimization problems is substantiated by 
replacing the quadratic term in the local model, that generally uses the Frobenius distance, by a Bregman 
divergence that places the set outside the constraints at an infinite distance. More explicitly, suppose 
that the constraints of a optimization problem are formulated as a positive definiteness condition A'^Q 
and that we want to find 

argmin f{A), 

by iteratively solving the optimization problems associated to penalized local models of the form 

fAi^M) := / + (V/ + (4.4) 

If in this local model we take (t>{X) := and the elastic penalization constant L is small enough, the 

minimizer argmin^^Q /^(n) (A) is likely to take place outside the constraints. However, if we use Burg's 
divergence Db instead, and A*-"-* is positive definite, then so is arg min^^Q /^(n) (A) for no matter what 
value of the parameter L. This is so because as A approaches the constraints, the term D,p{A, A^"'^) 
becomes increasingly close to infinity producing the effect that we just described; see Figure [4T] for an 
illustration. The end result of using Bregman divergences is that they reduce a constrained optimization 
problem to a series of local unconstrained ones. 




Figure 4.1: The blue function is subjected to the constraint x > 400 and, being strictly increasing, attains its minimum 
at X = 400. On the left hand side we use a standard quadratically penalized local model of the function and 
we see that its minimum is attained outside the constrained domain. On the right hand side we replace the 
quadratic penalization by a Bregman divergence that forces the local model to exhibit its optimum at a point 
that satisfies the constraints. 
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4.2 Bregman divergences for VEC models 

Before we tackle the VEC estimation problem, we add to (SC), (PC), and (CC) a fourth constraint 
on the variable c S M.^ that makes compact the optimization domain: 

(KC) Compactness constraint: KI^ — math(c) >: for some if e M. 

In practice the constant K is taken as a multiple of the Frobenius norm of the covariance matrix of the 
sample. This is a reasonable choice since by (3.2|, in the stationary regime c — (I^r — A — i?)vech(r(0)); 
moreover, by the constraint (SC) and (2.5) we have 

||c|| -||(lAr-A-S)vech(r(0))|| < ||lAr-^-B||op||vec||op||r(0)|| < 2||r(0)||. 

Now, given a sample z and a starting value for the parameters 6o — (cq, Aq, Bq), our goal is finding 
the minimizer of minus the quasi-loglikelihood f{6) := —\ogL{z;6), subjected to the constraints (SC), 
(PC), (CC), and (KC). We will worry about the problem of finding a preliminary estimation do 
later on in Section |4.4[ As we said before, our method is based on recursively optimizing penalized 
local models that incorporate Bregman divergences that ensure that the constraints are satisfied. More 
specifically, the estimate of the optimum 0^"+^) after n iterations is obtained by solving 



6/("+i) = argmin /("^(6»), (4.5) 



where /^"^ is defined by: 



/(")(6/) = /(6/(")) + (V/(6>(")),6>-6/(")) + ^Db{In - {A + B)^ {A + B) ,1^ - (A^") + s("))^(A(") + B^"))) 

+ ^DB(math(c),math(c("')) + ^Db{KIn - math(c),XlAr - math(c("))). (4.6) 

Notice that for the sake of simplicity we have incorporated the constraints in the divergences with the 
constraint tolerances ^ab^^At^Bi'^b^ and set equal to zero. 

The local optimization problem in (4.5) is solved by finding the value 6q for which 

V/(")(0o)=O. (4.7) 

A long but straightforward computation shows that the gradient V f^"'\0) is given by the expressions: 

'^A~&\e) = V^/(0(")) - + B) {^{In - (A("' + B("))^(A(") + B("))) - {In - (A + Bf{A + S))"'^ 
+ (l](A("))-i - E(A)-i) , (4.8) 

Vb/("'(0) = Vb/(0(")) - Li(A + B) (^{In - (^(") + b("))^(A(") + b(")))"' - (Iat -(A + Bf{A + B))"'^ 
+ (l](S("))-i -S(B)-i) -LiB (^(ijv - {In-B'^BY^^ , (4.9) 

Vc/^"H^) = Vc/(0*"^) + Y^ath* (math(c("))-i-math(c)-i 



^6 

2 



math* {{KIn - math(c("'))"^ - {KIn - math(c))"i) , (4.10) 
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where Vgf{0^^^) = — VelogL(z; 0^"^) is provided by the expressions in Proposition 3.3 We will nu- 
merically find the solution of the equation (4.71 using the Newton- Raphson algorithm, which requires 
computing the tangent map to V/("^(0). In order to do so, let g["'\A,B), g^\A,B), and g^\c) 
be the functions in the right hand side of the expressions (4.8), (4.9), and ( |4.10[ ), respectively, and 

g(")(A,B,c) (^gi"^(A,B),5^"'(A,B),5^"^(c)); additionally, define the map A{A) : ^ Mat by 
A{A) := In - A^ A, as well as 



s(r'(A) = -A A A( 



-k{Ay 



AK{A)-^ (A^(^) + A'^A) A(^)- 



for any A, A e Mjy. A straightforward computation shows that: 



Li4"j^(A^) 



X^(A^),Li4"|5(As 



T(^,B)5^"^ •(Aa,Ab) 

Tc5^"^ • Ae 



2 



L3 

2 



L,E^b\Ab) 



(4.11) 
(4.12) 

(4.13) 
(4.14) 



-math* (math(c) ^math(Ac)math(c) 



^math* {{KIn ~ math(c))-imath(Ac)(i^lAr - math(c))"i) (4.15) 

In obtaining these equalities we used that the tangent map to the matrix inversion operation inv(Ar) :— 
is given by Txinv • A = —X^^AX^^ and hence 

TaA{A)-^ ■ Aa = A(A)-i (A^A + A'^Aa) A(A)-i and TA'SiA)-^ ■ Aa = -^{A)-^^{Aa)T.{A)-\ 

The use of the tangent maps ( |4.13 )-(4.15 ) in a numerical routine that implements the Newton- Raphson 
method requires computing the matrix associated to the linear map T(^a,b,c)9^^"' ■ A major part in this 
task, namely the matrix associated to the map S^") in (4.11), admits a closed form expression that 
avoids a componentwise computation. Indeed: 



vec(S^')(A)) 



A - A{A)-^j (g) In vec(A) + \^{AA{A)-^f (g) AA{A)-^'^ KNNvec{A) 

+ [A(yl)-i^ ® AA{Ar^A^] vec(A), (4.16) 
where Knn is the (iV, iV)-commutation matrix (see |MN79j ). This expression implies that the matrix 



G Mjv2 associated to the linear map ^\ : Mjv — > Mjv is given by 



^(") _ 



a(a(")) -A{A)-^]®In 



{AA{A)-^)^ (S) AA{A)-^ 



K 



NN- 



\A{A) 



-IT , 



'AA[A)-^A^] 
(4.17) 



In order to obtain ( |4.16 ), we used the following properties of the vec operator: 

vec{AB) ^ (B^ ® I) vec(A), vec{ABC) = (C^ ® A) vec(A) and vec(A^) = KNN^ec{A), 

for any A, i?, C € Mjy. We have not found a closed formula for the matrices associated to the other linear 
maps that constitute (4.13)-(4.15) and hence they need to be obtained in a componentwise manner by 
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applying them to all the elements of a canonical basis. Let Xa, ^(yt ,_B.c)g ^"^ ^nd Tc^g"^ be the matrices 
associated to Xa, Tj^A.B,c)9^"'\ ^cSs"': respectively. Then, by (4.13 )-(4.15 1, we have: 



A+B T" 



V 







r "(») 



\ 



(4.18) 



Using this matrix, the solution 0o of (j4j| is the limit of the sequence {6)^"' '"^jfcgN := {(^^"' B^"' c^"' '=))}fceN 
constructed using the prescription 0^"' = 0''"'' = S*^"), c*^")) and by iteratively solving the linear 

systems: 

c(A("^ '=+!)) \ / vec(A(">'=)) \ 

• I vec(B("^ = -vec (v/(") re(„, .jgH • vec(B("''=)) . (4.19) 



Remark 4.2 Since the t angent map Teg^^^ can be assimilated to the Hessian of /'■"\ its matricial 
expression Tgg^"-^ in (4.18) should be symmetric. When this matrix is actually numerically constructed. 



the part resulting from the matrix identity (4.171 is automatically symmetric. The rest, that comes 



out of a componentwise study, may introduce numerical differences that slightly spoil symmetricity and 
that, in practice, has a negative effect in the performance of the optimization algorithm as a whole. 

That is why we strongly advice to symmetrize by hand T^g^") once it has been computed. 

4.3 Performance improvement: BFGS and trust-region corrections 

The speed of convergence of the estimation algorithm presented in the previous section can be sig- 
nificantly increased by enriching the local model with a quadratic BFGS (Broyden-Fletcher-Goldfarb- 
Shanno) type term and by only accepting steps of a certain quality measured by the ratio between the 
actual descent and that predicted by the local model (see |CGTOO I and references therein) 



The BFGS correction is introduced by adding to the local penalized model /'■"■' (0) defined in (4.6) 
the BFGS Hessian proxy i?*^") iteratively defined by: 



y{n-l)y{n-l)T H(n-l) g(n-l) g{n-l)T jj{n-l) 



V 



(n-1) T„(n-1) 



with an arbitrary positive semidefinite matrix and where s*^" := 0^""^ — and y^"^ := 

V/(0(")) - V/(0(""^^). More specifically, we replace the local penahzed model /("H^) by 



/(")(6>) := /(")(0) + e^"'>Y H^''^ (e-e 

whose gradient is obviously given by: 



(") 



{6) :== V/(")(0) = V/(")(0) (6>-6>("') =5(")(6/) + - 6/'")) , 



with ^("H^) = V/(")(0) given by ( |4.8[ )~( |4.10| . Using this corrected local penalized model, the solution 

(4.20) 



of the optimization problem will be obtained by iteratively computing 

6/("+i) = argmin /("'(^')- 



XMjvXMjv, 



Multivariate GARCH estimation via a Bregman-proximal trust-region method 



15 



This is carried out by finding the solution 9q of the equation 



0. 



(4.21) 



using a modified version of the Newton-Raphson iterative scheme spehed out in (4.19). In deed , it is 
easy to show that 6q is the hmit of the sequence {0'"' '^•'jfeeN constructed exactly as in Section ^ 
the linear systems (4.19) are replaced by 



4.2 



where 



TeZX^g^''^ +H^^^ .0(",fe+i) ^ _vec(v/(")(6>("''=))) + fl« . 6>(") + Te?r^(") • 6>("' (4.22) 
I vec(A("^'''+i)) 

where = vec(B("' '^+1)) | and G Ma^v^+Ar denotes the matrix associated to that 
satisfies 



f • e\ = • vec(B) 



for any 6 — (A, B, c) 



Important remark: the Newton-Raphson method and the constraints. In Section |4.1| we 
explained how the use of Bregman divergences ensures that at each iteration, the extremum of the 
local penalized model satisfies the constraints of the problem. However, the implementation of the 



Newton-Raphson method that provides the root of the equation (4.21) does not, in general, respect the 
constraints, and hence this point requires special care. 

In the construction of our optimization algorithm we have used the following prescription in order 
to ensure that all the elements of the sequence {0^"' '^■'jfegN that converge to the root 6q satisfy the 
constraints: given 0^"'^) — 0^") (that satisfies the constraints) let 0^"' ^^ be the second value in the 



Newton-Raphson sequence obtained by solving the linear system (4.22 1. If the value thereby 



constructed satisfies the constraints it is then accepted and we continue to the next iteration; otherwise 
we set 

g(n,2) _ n{n,l) 

, (4.23) 



0(^:2) ._ Q(n, 1) 



iteratively until 0'^"'^^ satisfies the constraints. Notice that by repeatedly performing (4.23), the value 
6^^' hence constructed is closer and closer to since this latter point satisfies the constraints, so 

will at some point ^' . This manipulation that took us from 6^^' to S*-"' ^' in a constraint compliant 
fashion has to be carried out at each iteration to go from ^f"'*-') to ^("'''+1). 

Trust-region iteration acceptance correction: given an starting point 0° we have given a prescrip- 
tion for the construction of a sequence {0^"-*}„gN that converges to the constrained minimizer of minus 
the quasi-loglikelihood f{0) :— — logL(z; 9). We now couple this optimization routine with a trust-region 
technique. The trust-region algorithm provides us with a systematic method to test the pertinence of 
an iteration before it is accepted and to adaptively modify the strength of the local penalization in order 
to speed up the convergence speed. In order to carefully explain our use of this procedure consider first 
the local model (4.5 1 in which all the constants Li, . . . ,Lq that manage the strength of the constraint 



penalizations are set to a common value L. At each iteration of (4.20) compute the adequacy ratio 
p^") defined as 

/(»)(0(") )-/(") (©("-!)) 

which measures how close the descent in the target function in the present iteration is to the one 
exhibited by the local model The values that can be obtained for p^"-^ are classified into three 

categories that determine different courses of action: 
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Too large step < 0.01: there is too much dissimilarity between the local penalized model 
and the actual target function. In this situation, the iteration update is rejected by setting 
^(n) _ g(n~i) penalization is strengthened by doubling the constant: L = 2L 

Good step 0.01 < p^"-' < 0.9: the iteration update is accepted and the constant L is left 
unchanged. 

Too small step 0.9 < p^"-*: the iteration update is accepted but given the very good adequacy 
between the local penalized model and the target function we can afford loosening the penalization 
strength by setting L = as the constant that will be used in the next iteration. 



Remark 4.3 Even though the definition of the adequacy ratio in (4.24) uses the full penalized local 



models /*•"-', we have seen that in practice the linear approximation suffices to obtain good results. 



4.4 Preliminary estimation 

As any optimization algorithm, the one that we just presented requires a starting point 0^"-'. The 
choice of a good preliminary estimation of 6^^"^ is particularly relevant in our situation since the quasi- 
loglikelihood exhibits generically local extrema and hence initializing the optimization algorithm close 
enough to the solution may prove to be crucial in order to obtain the correct solution. 

Given a sample z = {zi, . . . , z^}, a reasonable estimation for 0^°^ can be obtained by using the 
following two steps scheme: 

1. Find a preliminary estimation of the conditional covariance matrices sequence {-ffi, . . . , Ht} 
out of the sample z. This can be achieved by using a variety of existing non-computationally intensive 
techniques. A non-exhaustive list is: 

(i) Orthogonal GARCH model (0-GARCH): introduced in [Din94[ IXC97[ IAle981 EleOS] : this technique 

is based on fitting one-dimensional GARCH models to the principal components obtained out of 
the sample marginal covariance matrix of z. 

(ii) Generahzed orthogonal GARCH model (GO-GARCH) |vdW02| : similar to 0-GARCH, but in this 

case the one-dimensional modeling is carried out not for the principal components of z but for its 
image with respect to a transformation V which is assumed is assumed to be just invertible (in 
the case of 0-GARCH is also orthogonal) and it is estimated directly via a maximum likelihood 
procedure, together with the parameters of the onc-dimensional GARCH models. GO-GARCH 
produces better empirical results than 0-GARCH but it lacks the factoring estimation feature that 
0-GARCH has, making it more complicated for the modeling of large dimensional time series and 
conditional covariance matrices. 

(iii) Independent component an alysis (ICA-GA RCH): |WYL06l [GFGPP08| this model is based on a 
signal separation technique |Com94llH097| that turns the time series into statistically independent 
components that are then treated separately using one dimensional GARCH models. 

(iv) Dynamic conditional correlation model (DCC): introduced in |TT02[[Eng02| , this model proposes a 
dynamic behavior of the conditional correlation that depends on a small number of parameters and 
that nevertheless is still capable of capturing some of the features of more complicated multivariate 
models. Moreover, a version of this model |ES01| can be estimated consistently using a two-step 
approach that makes it suitable to handle large dimensional problems. 

Another method that is widely used in the context of financial log-returns is the one advocated by 
Riskmetrics |Ris96j that proposes exponentially weighted moving average (EWMA) models for the time 
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evolution of variances and covariances; this comes down to working with IGARCH type models with a 
coefficient that is not estimated but proposed by Riskmetrics and that is the same for all the factors. 

2. Estimation of 0^°' out of z and H — {i?t}tg{i,...,T} using constrained ordinary least squares. 

If we have the sample z and a preliminary estimation of the conditional covariances {^^(}tg{i,...,T}j ^ 
good candidate for 0^°^ = {A^°\ B^"\c^°')) is the value that minimizes the sum of the Euclidean norms 
St := ||ht - (c + Ari^_^ + Bht-i) that is, 

T T 
t=2 t=2 

subjected to the constraints (SC), (PC), (CC), and (KC). This minimizer can be efficiently found by 
using the Bregman divergences based method introduced in Sections |4.1| through |4.3| with the function 
s{A, B,c;z,, H) replacing minus the log-likelihood. However, we emphasize that unlike the situation 
in the log-likelihood problem, the choice of a starting point in the optimization of s{A,B,c;z,H) is 
irrelevant given the convexity of his function 



As a consequence of these arguments, the preliminary estimation is obtained by iterating (4.20) 



where in the local model (4.6) the map / is replaced by s. This scheme is hence readily applicable once 



the gradient of s, provided by the following formulas, is available: 

T 



t=2 
T 

Vbs = 2^[ch?:i+Ar,,_ihf_i+Sht_ihf_i-hth?li], 

t=2 
T 

Vcs = 2^ [c-|-^J74_i -f-Bht_i -ht] . 



5 Numerical experiments 

In this section we illustrate the estimation method presented in Section |4] with various simulations that 
give an idea of the associated computational effort and of the pertinence of the VEC model in different 
dimensions. 

The data set. We have used in our experiments the daily closing prices between January 3, 2005 and 
December 31, 2009 (that is, 1258 date entries) of the stock associated to the companies Alcoa, Apple, 
Abbott Laboratories, American Electric, Allstate, Amgen, Amazon.com, and Avon. All these stocks are 
traded at the NYSE in US dollars and, in the last date of our sample, they were all constituents of the 
S&P500 index. The quotes are adjusted with respect to dividend payments and stock splits. Figure [ST] 
represents graphically the data set. 

Computational effort associated to the estimation method. In table [ST] we have gathered the 
required computing time and the necessary gradient calls to fit VEC(1,1) models to the log-returns of 
our data set in different dimensions. In the n = \ column we present the results associated to fitting a 
VEC model to the log-returns of the first element of the data set; the same in the n — 2 column with 
respect to the log-returns of the first two elements of the data set, and so on. The stopping criterion 
for the algorithm is established by setting a termination tolerance on the function value equal to 10^^. 
The last row of the table shows how the algorithm becomes increasingly costlier with the dimensionality 
of the problem when the BEGS correction is dropped. The results of this experiment suggest that the 
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Figure 5.1: Stock quotes used in the numerical experiments. The quotes represent closing prices adjusted with respect to 
dividend payments and stock splits. Source: Yahoo Finance. 
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trust-region correction speeds up the algorithm and the BFGS modification makes the convergence rate 
dimensionally independent. 









Computation time and 


gradient calls 








n — 1 {3 parameters) 


n — 2 (21 parameters) 


n — 3 (78 pai'ametcrs) 


n — 4 (210 parameters) 


n — 5 (465 parameters) 


n — 6 (903 parameters) 




Grad. calls Time 


Grad. calls Time 


Grad. calls Time 


Grad. calls Time 


Grad. calls Time 


Grad. calls Time 


Full method 


50 1.45 sec 


97 58 sec 


99 3 min 9 sec 


94 18 mill 


85 73 mill 


105 5 hrs 30 min 


No BFGS 


106 2.30 sec 


281 159 sec 


378 8 mill 57 sec 


404 47 mill 


534 298 min 


591 22 hr 



Table 5.1: Computation time and gradient calls required when running the estimation method presented in Section [4] 
with and without the BFGS correction. The simulations were carried out using a nonparallelized Matlab script 
on an Apple computer endowed with two double core 3 GHz processors, 64 bits. 



Variance minimizing portfolios, proxy replication, and spectral sparsity. As we have aheady 
pointed out several times, the main concern when using VEC models lays in the overabundance of 
parameters, whose number may easily be bigger than the sample size in standard applications, even 
when dealing with low dimensional problems. This lack of parsimony already appears when dealing 
with our data set for it contains 1257 historical log-returns, while the VEC(1,1) model requires 1596 
parameters in dimension seven and 2628 in dimension 8. 

The goal of the following experiment consists of assessing how serious this problem is. More explicitly, 
we will study how the pertinence of VEC as a modeling tool evolves with the increase in dimensionality, 
when compared with other more parsimonious and widely used alternatives, namely: 

• Exponentially Weighted Moving Average (EWMA) model for the conditional covariance matrices 
with the autroregressive coefficient A = 0.94 proposed by Riskmetrics |Ris96j for daily data. 



• Orthogonal GARCH model (OGARCH), as in |Din94[ IXC97l IAle981 IHeOS] . 

• Dynamic conditional correlation model (DCC) of jTT021 [Eng02| . 
These modeling approaches will be tested by evaluating: 

• Comparative performance in the construction of dynamic variance minimizing port- 
folios: all the models that we just enumerated and that take part in our comparison share the 
form 



jirl/2 



with 



{et}-IIDN(0,/„), (5.1) 

where {Ht} is a predictable matrix process. What changes from model to model is the specifi- 
cation that determines the dynamical behavior of {Ht}; in the particular case of VEC(1,1), that 
specification is spelled out in (3.1 1. When (5.1 ) is fitted to the log-returns associated to our data 



set, the matrices {Ht} provide an (model dependent) estimate of the conditional covariance of the 
log-returns process. Moreover, it is not difficult to show that if w = (wi, . . . , w„)' is a weights 
vector such that Y^^=i'^i ~ 1' then the conditional variance of the net returns process of the 
associated portfolio is given by {w"^^tw}, where At is the matrix whose (i, j) entry A^j is given 

by 



4* 



exp 



ik ^jk 



with hlj the entry of the matrix Ht. A dynamic variance minimizing portfolio is a weights 
vector defined as the solution at each time step of the optimization problem 



arg min w"^ w . 



(5.2) 
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A straightforward application of Lagrange duality shows that the solutions Wt of (5.2 1 are given 
by either the zero eigenvectors of At, or by 

= (5.3) 
1 /ij 1 

when At is invertible, where i is an n-dimensional vector made exclusively out of ones. In our 



numerical experiment we will always fall in the situation contemplated in (5.3) and it is this 
expression that we will use to construct the dynamic variance minimizing portfolios associated 
to each of the different models that we are testing. Figure |5.2| shows the conditional variance 
of the net returns process associated to the variance minimizing portfolios corresponding to the 
different models under consideration. It is tempting to say that the most performing model is the 
one for which the conditional variance is consistently smaller; however, given that the conditional 
variance is model dependent, these quantities are not directly comparable and it is only the marginal 
variances of the optimal portfolios that can be put side to side. This comparison is carried out 
in table |5.2| in which we see that VEC allows the construction of portfolios with smaller variances 
than those corresponding to the other models in all the dimensions considered. 







Variance of optimal portfolios 


(xlO-4) 






n = 2 


n = 3 


n = 4 


n = 5 


n = 6 


n = 7 


n = 8 


EWMA 


5.03 


1.72 


1.50 


1.44 


1.49 


1.59 


1.62 


OGARCH 


5.29 


1.75 


1.50 


1.42 


1.42 


1.45 


1.50 


DCC 


4.96 


1.74 


1.47 


1.37 


1.38 


1.40 


1.43 


VEC 


4.91 


1.70 


1.39 


1.22 


1.15 


1.15 


1.12 



Table 5.2: Marginal variance of the net returns associated to the variance minimizing portfolios corresponding to the 
different models. 



• Goodness of fit between the associated conditional volatilities and the absolute values 
of the log-returns used as a proxy for conditional volatility: following |MZ69[[GN861RB97( 
IMCV02j , we evaluate the performance of the different models by considering the absolute values of 
portfolio returns as a proxy for conditional volatility and by checking how the different proposals 
coming from the models under scrutiny fit this proxy. Even though it is well known |AB97) 
that this is a very noisy proxy for volatility, this approach provides us with quick and simple 
to implement ways to compare different modeling approaches. The first one consists of fitting 
each of the models to the first i assets with i S {1,2,. ..,8} and computing the mean Euclidean 
distance (MSE) between the model associated conditional volatility and the proxy values; the 
results of this experiment are presented in table |5.3| where we see that VEC produces a smaller 
MSE in all the dimensions considered, even surprisingly at dimensions 7 and 8 where the number 
of parameters to be estimated is bigger than the sample size. The plausibility of this result is 
visually emphasized in figure |5.3| where we have depicted the conditional volatilities of one of the 
assets in our data set (AA) obtained out of the models under consideration in dimension 8, as 
well as of a one-dimensional GARCH model; these volatilities are graphically compared with the 
proxy. 

Finally, we have ranked the different models by studying their efhciency in modeling the volatility of 
constant random portfolios; more especifically, at each dimension i, i £ {1, 2, . . . , 8}, we randomly 
choose i weights using standard normally distributed variables and we appropriately normalize 
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31-Mar-2005 31-Oct-2006 01-Jun-2008 01-Jan-2010 





I 



Figure 5.2: Conditional volatility of the net returns associated to the dynamic variance minimizing portfolios constructed 
by fitting different models to the log-returns of our eight dimensional data set. The VEC estimation was 
carried out setting a termination tolerance on the function value equal to 10~^ and using an OGARCH based 
OLS preliminary estimation, as explained in Section 14.41 
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Figure 5.3: Conditional volatility of the asset Alcoa (AA) obtained out of eight dimensional modelings. The graphics 
EWMA, OGARCH, DCC, and VEC represent the volatility obtained as the square root of the (1, 1) compo- 
nents of the 8x8 conditional covariance matrices associated to those models. GARCH represents the volatility 
associated to a one-dimensional GARCH modeling of the log-returns of AA, and PROXY shows the absolute 
value of the AA log-returns. 
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Mean square error 


with respect to proxy ( x 10 






n = 1 


n = 2 


n = 3 


n = 4 


n = 5 


n = 6 


n = 7 


n = 8 


EWMA 


5.19 


4.31 


3.23 


2.71 


3.03 


2.89 


3.42 


3.39 


OGARCH 


5.08 


4.33 


3.24 


2.71 


3.02 


2.88 


3.44 


3.38 


DCC 


5.08 


4.23 


3.17 


2.66 


2.98 


2.85 


3.39 


3.43 


VEC 


5.08 


4.17 


3.12 


2.59 


2.91 


2.68 


3.17 


3.16 



Table 5.3: Mean square errors committed when modeling the absolute values of the returns with the conditional variance 
associated to the different models. 



them so that their sum equals to one. We then use the conditional covariance matrices provided 
by each of the models under consideration to compute the (model based) conditional volatility 
of the portfolio. We then regress the proxy for the portfolio volatility, namely the absolute value 
of the portfolio returns, on the various portfolio volatilities provided by the different models and, 
using the suggestion in |MCV02| we declare as the best model the one that produces the highest 
coefficient of determination R^. As the chosen proxy is know to be very noisy |AB97) the obtained 
R^ coefficients are rather small (typically between 0.2 and 0.3). Using this criterion, we randomly 
generated 5,000 portfolios at each dimension, and we recorded the percentage rate of relative 
success of each model with respect to the others. The results of the experiment are presented in 
table [5^ and show the superiority of VEC in all the dimensions considered. 



Success rate in modeling random portfolios (%) 



Number of assets 


n=8 


n=7 


u=6 


11=5 


n=4 


n=3 


n=2 


Number of VEC parameters 


2628 


1596 


903 


465 


210 


78 


21 


DCC 


25.70 


32.20 


32.00 


32.27 


17.70 


24.10 


24.50 


EWMA 


1.27 


0.9 











0.3 





OGARCH 


28.18 


19.40 


8.30 


7.87 


3.70 


21.80 


22.80 


VEC 


44.85 


47.50 


59.70 


59.83 


78.60 


53.80 


52.70 



Table 5.4: Percentage rate of relative success of each model with respect to the others in modeling the volatility of random 
portfolios. For each dimension n, we randomly generated 5,000 portfolios and we considered as the best model 
the one that produced the highest coefficient of determination when regressing the absolute values of the 
corresponding returns on the conditional covariances associated to the each model. VEC consitently presents 
the highest success rate regardless of the dimension. 



• Spectral sparsity and high dimensional estimation: a major surprise revealed by these 
numerical experiments is that the estimated models provide good empirical performance despite 
the highly unfavorable ratio between the sample size and the number of parameters to be estimated. 
In order to investigate the reasons for such a counterintuitive but pleasing phenomenon, we plotted 
the eigenvalues of ^{A) and T,{B) for n between 4 to 8. These plots, displayed in Figure 5.4 show 
that the estimators S(^) and S(i?) of S(A) and S(B) are spectrally very sparse, that is, have a 
very low rank. Thus, the solutions of the estimation problem are exactly the same as the ones 
we would have obtained under additional a priori rank constraints, a setting that would have 
implicitly reduced the dimension of the parameter space by a large factor. This suggests that 
in our particular empirical situation, the number of parameters that are actually independent is 
much smaller than the number of entries in the coefficient matrices A, B and c, which makes 
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possible the use of small relative sample sizes in the VEC context. The most obvious explanation 
for this phenomenon stems from the well-known fact that the conditional covariance matrices 
Ht corresponding to stock market returns present spectral accumulation (in small dimensional 
settings) or sparsity (in large dimensions). This seems to make the positive semi-definiteness 
constraints on S(A) and S(_B) highly active, which enforces a large number of eigenvalues to be 
equal to zero. 

Notice further that the proportion of nonzero eigenvalues of S(A) and S(i?) decreases very slowly 
as a function of the parameter space dimension and shows no particular abrupt transition when 
the dimension/sample size ratio becomes large. This phenomenon suggests that the constrained 
maximum likelihood approach is very stable for this hard estimation problem. On the other 
hand, pushing theses ideas further along the lines of recent works in sparse estimation and matrix 
completion problems |CR09i[CTT0) . one might expect that explicitly enforcing the spectral sparsity 
of the estimators might improve their performance for dimensions much larger than the ones 
explored in the present work. A rigorous treatment of these observations is needed and will be 
the subject of further research in a forthcoming paper. 

6 Conclusions 

In this paper we provided an adequate explicit formulation of the estimation problem for VEC models 
and developed a Bregman-proximal trust- region method to solve it. This combination of techniques pro- 
vides a robust optimization method that can be surely adapted with good results to more parsimonious 
multivariate volatility models. 

We carried out numerical experiments based on stock market returns that show the applicability of 
the proposed estimation method in specific practical situations. Additionally, our numerical experiments 
reveal how the empirically well documented spectral accumulation in the covariance structure of stock 
quotes implies, in the context of VEC modeling, implicit nonlinear constraints in the parameter space 
that make this parametric family competitive even in the presence of a highly unfavorable ratio between 
the sample size and the number of parameters to be estimated. The comparison has been carried 
out with respect to other standard and more parsimonious multivariate conditionally heteroscedastic 
families, namely, EWMA, DCC, and OGARCH. An in-dcpth study of this phenomenon will be the 
subject of a forthcoming publication. 
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Figure 5.4: Eigenvalues of T,{A) and for n between 4 to 8. The spectral sparsity evidenced in these plots suggests 

nonlinear constraints in the parameter space which explain the good empirical performance of the models 
despite the highly unfavorable ratio between the sample size and the number of parameters to be estimated. 
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7 Appendix 



7.1 Proof of Proposition 2.1 



We start with the proof of (i) by using the fohowing chain of equahties in which we use the symmetric 
character of both A and math(TO): 

(A + diag(A), math(m)) = trace(^ math(m)) + trace(diag(A)math(TO)) 

n 

= Aijmath{m)ji + Aijdij'niat]i{m)ji 

n 

— Aijmath.{m)ij + Aij'math{m)ij + 2 Aijm.aih{m)ij 



N 

= 2^ Aymath(m)ij = Ai^m^^ij) = 2 Ao.-i(^)TOg 
= 2(vech(A),m), 

as required. In order to prove (ii), note that the identity that we just showed ensures that 

(A,niath(TO)) = 2(vech(A),m) — (diag(A), math(r7i)). 

At the same time 



(7.1) 



(diag(yl), math(m)) = trace(diag(A)math(m)) ^'^^ Aiimai\\{m)ii — Aam, 



N 

^Y^diag{A)ijm„(ij) = ^ diag(yl)^-i(,)m, 

i>3 9=1 
N 

^ vech(diag(74))qTOg = (vech(diag(A)), m). 

9=1 



which substituted in the right hand side of (7.1) proves the required identity. Finahy, expression (2.3 1 
foUows directly from (ii) and as to (2.4) we observe that 



-{A + diag(A), math(m)) 



- trace((A + diag(A))math(TO)) 

- trace(ylmath(TO)) + - trace(diag(A)math(TO)) 

- (trace(Amath(m)) + trace(Adiag(math(m)))) 
-(yl,math(m) + diag(math(m))), 



which proves (2.4). Regarding the operator norms we will just prove (2.5) and (2.6) as the rest can be 
easily obtained out of these two combined with the expressions (2.3) and (2.4). We start by noticing 
that for any nonzero A — (aij) G §„: 



||vech(A)||2 _ Ez>j=i4+Sr=i 



n 9 



\AP 



2E 



,>j=l ^ij + 2^i=l '^ii 



1 
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Since the last summand in the previous expression is always positive we have that 

||vech(^)|| 



|vech||op = sup 

A<£S„,A^O 



\A\\ 



1, 



the supremum being attained by any diagonal matrix {Yl^>j=i '^fj ^ in that case). Consider now 
V = vech(A). Then: 



|math(i;)| 



iivech(A)i|2 e:m=i4+i:Li 



En 



(7.2) 



When we let A € §„ vary in the previous expression, we obtain a supremum by considering matrices with 
zeros in the diagonal (X]"=i '^li = 0) ^^id by choosing X]r>j=i '^ij — ^ oo, in which case j[vcch(!4)p ~^ ^• 
Finally, as the map vech : §„ — > is an isomorphism, (7.2) implies that 



I math 1 1 



sup 



math(u)| 



l^ll 



^ II I 



Aes^.A^^o l|vech(^)|| 



V2. 



7.2 Proof of Proposition 2.3 



We just need to verify that (2.11 ) satisfies (2.12 ). Let A;, / e {1, . . . , rt} be such that k > I. Then, 

(Avech(i/))cr(fc_;) = y^^Acr(kd),<7(i,j)Hij = ^g(fc,0,<T(t j) 

i>j i>j 



2 H ^'y{k.l),a(^,J)H^J + 3 H 
i>j i>j 



i>j 



i<j 



= ^(I](^)fci)^,^»j + J2{^iA)kl)^JH,, + Y,{^{A)kihH,j = trace(S](A)feii/) 



i>j 



as required. 



7.3 Proof of Proposition 2.4 



We start with the following Lemma: 

Lemma 7.1 Let A £ M„2. The orthogonal projections P„2(A) e §„2 and PJ^2(^) G SJJa o/A onto f/ie 
spaces of symmetric and n-symmetric matrices with respect to the Frobenius inner product (2.1) are 
given by: 



[A + A^) 



{KM))ki = -{A,i + Al + Ai>, + Ai^), 



(7.3) 
(7.4) 



for any block {Vl,{A))u ofVl,{A), fc, Z G {1, . . . , n}. 
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Proof. In order to prove ( |7.3[ ) it suffices to check tliat {A — P„2 (A) ,B)=0 for any B G S„2 . Indeed, 

{A-¥„2{A),B) = trace{AB) - ^tia.ce{AB) - ^ trace(A'^B) = 0. 

The result follows from the uniqueness of the orthogonal projection. Regarding we check that 

{A - ¥"^2 iA),B) = 0, for any B e SJ^^ • Given that for any k,l e {1, . . . ,n} the block {AB)ki is given by 
{AB)ki = Ylr=i ^krBri wc have 

n 

{A - {A), B) = trace(AB) - trace(P;^2 {A)B) = ^ tr&ce{AB)u - trace(P;j2 (A)B)u 

i=l 

n n 

= ^ iTace{AijBj{) - iY&ce{{¥'^2{A))ijBji) = ^ trace(A,jBj,) 



E 



^ trace(yly Bji) + ^ trace(A^i?ji) + ^ trace(AjiBji) + ^ trace(Ajj_Bji) 



0, 



where we used that, due to the rt-symmetricity of B trace(A^_Bji) = trace (i?J^ Ay ) — ir&ce{AijBji) and 

n 

trace(v4jii3jj) = trace (y4jji?ij) = trace(Ay Bjj). 

Analogously X]"j=i trace(AjjBji) = ira.ce{AijBij). ■ 

Now, in order to prove Proposition |2.4[ consider A S M^v and B G M„2. Since the image of the map 
S lies in S^^ we have that {B - V^^iB), T.{A)) = and hence 

= = {Vl2{B)+B-r:AB)MA)) = (p:2(S),S(A)) = (E*(P;:2(S)),A). 



This identity allows us to restrict the proof of (2.4 1 to the n-symmetric elements B £ §^J^2- Hence let 
B e §"2 and let a be the extension of the map a defined in ( |2.2[ ) . Then, 



J — 1 



kl=l 



n ^ 

L 

n n I ™ 



k,Li,j = l 
n 

E 



i=j = l 



n n 



E ^A^(.k,l):'y(.iJ)i^kl)ji 
k-J = l i>j 



Kk 



'^^A„(i^k),a(j,i){Blk)ji+ ^ A„{k,l),a(i,j){Bkl)i] +'^^A„(^k,l),(j(i,j){Bkl)i 
k<l k=l = l 

n n 



E 

n 

E 

N 

[^Ap^qBp,q - ^p,95p,«Vi('T-i(p)),pr2(<T~i(p))] = trace(2AB^ - AB^) = {A, 2B - B) 

P,9=l 



k>l 



k=l=l 
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which proves the statement. We emphasize that in the fourth and sixth equahties we us^d the n- 
symmetry of B. The equahty (2.14) is proved in a strai ghtfo rwar d man ner by verifying that o E = 
Imjv and E o — Ig" using the defining expressions (2.2) and (2.14). ■ 



7.4 Proof of Proposition 3.1 



Using the property of the operator E stated in Proposition 2.3 the second equahty in (3.1) can be 
rewritten as: 



vech(iJf) 



or, equivalently: 



vech(math(c)) + Avech{zt^izJ_i) + i3vech(iJf_i) 
vech(math(c)) + vcch(E(A) • (zt_izf_i)) + vcch(E(B) • Ht-i), 

Ht = niath(c) + E(A) • (zt_izf_i) + E(5) • Ht-i. 



In view of this expression and in the terms of the statement of the proposition, it suffices to show 
that both E(A) • (zt_iz^j) and E(i3) • Ht-i are positive semidefinite provided that Ht-i is positive 
semidefinite. Regarding E(yl) • (Zf_izf_i), consider v e M" . Then 

(v, E(^) • (zt_izf_i)v) = ^ i;,;(E(A) • (zt_izf_i))iji)j = ^ Vitra,ce{Y.{A)ij{zt-i^J-i))vj 



= ^ -yitrace(zf_iE(yl)ijZ4_i)uj = ^ VizJ^^ ,,{'L{A)ij)kiZt-i^iVj 

— l i,j,k,l — l 

= (v(g)Zt_i,E(A)(v® zt_i)), 

which is greater or equal to zero due to the positive semidefiniteness hypothesis on E(^). In the last 
equality we used (2.10). 

As to E(_B) • Hf-i, we start by noticing that iJf_i — Et^i[zt^iz[_i] and hence Tj{B) • Ht^i — 
E(i?) • Et-i[zt-izf_i]. This equality, as well as the linearity of the conditional expectation allows us to 
use virtually the same argument as above. Indeed, for any v G K"^ 

(v, E(B) • i?t_iv) = ^ Vitra.cc{j:{B)ijEt-i[zt^izJ^^])vj ^ ^ Et-i[vitiace{J:{B)ijZt-izJ_^)vj] 

= £;t_i[(v(g)Zf_i,E(B)(v®zt_i))], 
which is greater or equal to zero due to the positive semidefiniteness hypothesis on 5](_B). ■ 



7.5 Proof of Proposition 3.2 



We start by noticing that the VEC(1,1) model is by construction a white noise and hence it suffices 
to establish the stationarity of the variance. Indeed, for any /i e N we compute the autocovariance 
function F: 



T{t,t + h):=E[ztzJ_,^]=E Et Hl'^tet+nHH^i 



,1/2 



= E 



H^^Et [etet+h] H^l^,^ 



1/2 



^ShoE 



„l/2„l/2 



(7.5) 
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Consequently, we just need to prove the existence of a solution for which r(t, t) = E \Ht\ or, equivalently 
i?[ht], is time independent. We first notice that 

E[\Yt] =E[c + Ar]t_^ + Bht-i] ^E[c + Aht-i + Bht-i]+AE [r7t_i - ht-i] =E[c + Aht-i + Bht^i] 



since AE [r]^_l - /it-i] = by (7.5 1. Now, for any fc > 

A; 

E[\,t] =c + {A + B)E [h,_i] = Y,{A + Bye +{A + Bf+^E [h^^k-i] ■ 

j=o 

If all the eigenvalues of A + i? are smaller than one in modulus then (see, for example |Lut05| Appendix 
A.9.1]) 

fc 

y"{A + Byc > {In - A- B)-^c, and {A + B)''+^E [ht-k-i] ^ 0, 

j=o 

in which case ^'[hf] is time independent and 

r(0) = math(£;[ht]) = math((lAr -A- B)-^c). 

The sufficient condition in terms of the top singular value crmax(^ + ^) of A + _B is a consequence of 
the fact that (see for instance |HJ94[ Theorem 5.6.9]) |A(yl + B)\ < (JmaxiA + B), for any eigenvalue 
X{A + B)ofA + B. m 



7.6 Proof of Proposition 3.3 



The chain rule implies that for any perturbation A in the 9 direction 

deh ■ A = dnMHtm ' TeH^ ■ A - {Vn^uTgHt ■ A) - {T^Ht ■ VhJu^), 



which proves that Vek — TgHt ■ ^ nji and hence (3.8) follows. We now establish (3.91 by showing 
separately that 



Vh, log(det(ifO) = i?r' and V^, (^-^zf i/r'^t) - \ {Hy^t^J H^^) 



(7.6) 



In order to prove the first expression we start by using the positive semidefinite character of Ht in order 
to write Ht — VDV^ . V is an orthogonal matrix and D is diagonal with non-negative entries; it has 
hence a unique square root D^^^ that we can use to write Ht = VDV^ — {VD^/^){VD^^'^)'^ . Let ^ e M 
and A e S„. We have 

log(det(i7t + SA)) = log{det{{VD^/^){VD^^Y + SA)) 

= log(det((rai/')(I„ + S{D-^^^V^)A{VD-^/^)){VD^^Y)) 

= log(det(yDi/2) det(I„ + S{D-^^^V'^)A{VD-^/^)) det{VD^/Y) 

^ log{det{{V D^/^)iVD^^Y) det(I„ + diD-^^^V^)AiVD-^^Y 

= log(det(i/t)det(I„ + 5S)), 

with 5 :— {D~^/^V'^)A{VD~^^^). This matrix is symmetric and hence normal and diagonalizable; let 
{Ai, . . . , A„} be its eigenvalues. We hence have that 



dHt ■ A 



d 
dS 



log(det(fft+5A)) 



dS 



(5=0 



log(det(ffO)+log n(l + '^A,) 



d 
dS 



^log(l + a. 



5=0 ,^1 



^ A, = tTace{{D-^/'^V^)A{VD'^^^)) = tTace{{VD'^^^){D-^^^^V'^)A) = trace{Hy^A), 



i=l 
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which proves V//j log(det(i?t)) = ^. Regarding the second expression in (7.6) we define f{Ht) := 



— |z^iJ( ^zt and note that 



df{Ht) ■ A 



1 



t=0 



d 

dt 

d_ 
~dt 



zi{lr, + tH^'A)-'H,-'zt 



zJ{I„+tH^^A)-^Hi^zt 



dt 



^ OO 



fc=0 



which imphes that V Htf = h. i^t ^'^t^fHi as required. 



In order to prove ( 3.10 1-( 3.12) we notice that the second equation in (3.1) can be rewritten using 



the vech and math operators as 



Ht = math (c + Arif_i + Bvech{Ht-i)) 



(7.7) 



We now show (3.10). Let v e M and A e S„ arbitrary. Identity (7.7) and the linearity of the various 
mappings involved imply that T^Ht ■ v = math (v + _Bvech(Tci/t_i • v)) and hence 



{T:HfA,v) 



(A, T,Ht ■ v) = (A, math (v + Bvcc\i{T^Ht-i ■ v))) 
(math*(A) + T*Ht-i ■ vech*(B^niath*(A)), v>. 



The proof of (3.11 ) follows a similar scheme. By \I.7\ we have that for any M e Mjv: 

TaHi ■ M = math {Alrjt^^ + Svech(TAi?t-i • M)) . 

Consequently, for any A g S„ 

{TXHt ■ A, M) = (A, TAHt) - (A, math (Afry^^i + Bvcch{TAHt-i ■ M))) 
= (math* (A) • r]J_^ + T'XHt-i ■ vech*(B^math*(A)), A). 



(7.8) 



Finally, (3.12) is proved analogously replacing (7.8) by its B counterpart, namely, 
TsHt ■ M = math (Mvech(i7t_i) + Bvech(TBi/t-i ■ A/)) . ■ 



7.7 Proof of Proposition 3.4 



An inductive argument using ( |3.10 )-(3.12) guarantees that for any t, k £ N, k < t 

k 

T^H* ■ A = ^B'-i^math*(A) +T<!iJf_fc • vech*(B'^^math*(A)), 

i=l 
k 

TXHfA = ^B'-i^math*(A)-J7f_, +rliJf_fc-vech*(B''^'^math*(A)), 



(7.9) 
(7.10) 



i=l 
k 



T*BHfA = ^B'-i^math*(A) •vech(i/t_,)'^ + T^fft_fc •vech*(B'=^math*(A)), (7.11) 



The first expression with k = t and the norm estimate (2.8) imply that 

t 

^S*~i^math*(A) 



\TcH: ■ A I 



<^E™p^IIa|I<y^ 



(7.12) 



op 
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We now use (7.9) for an arbitrary k as well as (2.7) and (7.12) and write 
\\iT:Ht ~ T:H^) ■ All = \\T:Ht-k ■ vech*(i3^-^math*(A))|| 



2||A||||B||^p 



< ||T:i7t_fe||op||vech*||op||i?||Sp||math*||op||A|| < ' „,| ^ (7.13) 

l-ljiillop 

The computability constraint (CC) implies that ||i?||op < 1 - es and hence \\T*Ht — T*Hf\\op < 
2(1 — tB)^/f^B- A straightforward computation shows that if we want this upper bound for the error to 
be smaller than a certain i5 > 0, that is 2(1 — €bY /'^b < S then it suffices to take 



k > 



log (¥) 
log(l -?b)' 



(7.14) 



We now tackle the estimation of the truncation error in mean in the A variable. Firstly, we recall that 
by (7.5) and in the presence of the stationarity constraint E[rif.] = E[h.t] = {In — A — B)~^c. The first 
consequence of this identity is that if we take the expectations of both (7.10) and (7.11) we see that 
\\E [T^Ht • A] II and \\E [T^Ht • A] || are determined by exactly the same recursions and hence the error 
estimations for both variables are going to be the same. Also, by (17. 101) 



\E[TXHfA]\\ = 



J2B'-^^meith*iA)-E['n 



T 1 



1=1 



< 



y2||A||||£;[h,]||^||i?||-i 



< V2\\A\\\\{In~A-B)-'c\\/7b < \/2||A||||c||/eAB?i3. (7.15) 
The last inequality is a consequence of the constraints (SC) and (PC). Indeed, 



\\ilN-A-B)-'c\\ 



J2iA + Byc 



i=0 



<£||(A + i?)||;p||c||<£(l-6^B)lc|| 



i=0 



i=0 



£AB 



Now, by (7.10) and (7.15 



11^ [{TXHt - TXHi^) • A] II = ||£; [TXHt^k ■ vech*(S'=^math*(A))] || 

< ||rli/t_fe||op||vech*||op||B||^p||math*||op||A|| < 



^AB^B 



which proves (3.18). If we want this upper bound for the error to be smaller than a certain 5 > 0, we 
have to make the number of iterations k big enough so that 



2||c] 

^AB^B 



(1-?b)''<(5 that is (1-es)* 



SeAB^B ^ SeAB^B 



2||c|| 



< 



2e, 



This relation, together with (7.14) proves the estimate (3.20). 
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